Identification of Novel Prognostic Biomarkers for Colorectal Cancer by Bioinformatics Analysis

Background/Aims: Colorectal cancer (CRC) ranks third among malignancies in terms of global incidence and has a poor prognosis. The identification of effective diagnostic and prognostic biomarkers is critical for CRC treatment. This study intends to explore novel genes associated with CRC progression via bioinformatics analysis. Materials and Methods: Dataset GSE184093 was selected from the Gene Expression Omnibus database to identify differentially expressed genes (DEGs) between CRC and noncancerous specimens. Functional enrichment analyses were implemented for probing the biological functions of DEGs. Gene Expression Profiling Interactive Analysis and Kaplan–Meier plotter databases were employed for gene expression detection and survival analysis, respectively. Western blotting and real-time quantitative polymerase chain reaction were employed for detecting molecular protein and messenger RNA levels, respectively. Flow cytometry, Transwell, and CCK-8 assays were utilized for examining the effects of GBA2 and ST3GAL5 on CRC cell behaviors. Results: There were 6464 DEGs identified, comprising 3005 downregulated DEGs (dDEGs) and 3459 upregulated DEGs (uDEGs). Six dDEGs were significantly associated with the prognoses of CRC patients, including PLCE1, PTGS1, AMT, ST8SIA1, ST3GAL5, and GBA2. Upregulating ST3GAL5 or GBA2 repressed the malignant behaviors of CRC cells. Conclusion: We identified 6 genes related to CRC progression, which could improve the disease prognosis and treatment.


INTRODUCTION
Colorectal cancer (CRC) ranks third among malignancies in terms of global incidence and is most commonly observed between 40 and 50 years of age. 1 Colorectal cancer accounts for almost 50% of all incident cases in men and is the second primary cause of mortality for cancer in the United States. 2 In China, estimates from 2022 showed that there were approximately 592 232 new CRC cases and 309 114 deaths attributed to the disease. 3olorectal cancer is usually asymptomatic at the early stage; the 5-year survival rate is over 90% at the early stage but is only approximately 12% at advanced stages. 4ost patients receive a diagnosis of CRC at advanced stages, when the optimal opportunity for treatment has passed, due to a lack of effective diagnostic approaches. 5espite the advances in techniques, the prognosis for CRC patients diagnosed with metastasis is far from satisfactory. 6Hence, for early detection and prognosis prediction of CRC, identifying novel biomarkers is urgently needed.
With the development of sequencing technologies and large-scale sequence research, numerous sequencing data sets have been collected in various databases.Bioinformatics analysis is a critical method for comprehensively examining large databases that contain complex genetic information. 7Numerous studies have employed bioinformatics analysis to identify and analyze genes involved in cancer pathogenesis and progression.The findings of differentially expressed genes (DEGs) provide substantial and valuable information for the study of CRC, helping to improve the early detection, therapeutic strategies, and prognosis of CRC. 8 For example, COL12A1 and its homologous family proteins such as COL1A2, COL3A1, and COL5A1, are considered prognostic biomarkers for CRC. 9 Moreover, studies have identified many immune-related prognostic markers that may reflect the immune dysregulation in the tumor microenvironment of CRC. 10 Despite these findings, CRC remains a complicated disease, and it is still imperative to determine more predictive markers to improve screening and treatment of the disease.Herein, we selected the dataset GSE184093 from the Gene Expression Omnibus (GEO) database for identifying DEGs between CRC and adjacent non-tumor samples.Functional enrichment analyses were conducted to investigate the potential functional roles of the DEGs, and genes enriched in the top-ranked KEGG pathways were selected for further analyses with bioinformatics tools, aiming to research the molecular mechanisms underpinning CRC progression and to identify potential biomarkers with diagnostic and prognostic values for the disease.

Identification of Differentially Expressed Genes
The interactive online tool GEO2R (https ://ww w.ncb i.nlm .nih.gov/g eo/ge o2r/) was employed for DEG identification between CRC and non-cancerous samples.The volcano plot was generated to depict the results of gene selection.The Benjamini & Hochberg method (false discovery rate) was employed for adjusting the P-values.Genes meeting the thresholds of |log2 fold change (FC)| > 0.5 and adjusted P (P adj.) < .05were selected as DEGs.

Functional Enrichment Analysis
To preliminarily investigate the functions of the identified DEGs, Database for Annotation, Visualization, and Integrated Discovery (DAVID) (https ://da vid.n cifcr f.gov /) was employed to conduct Gene Ontology (GO) and KEGG enrichment analyses.The GO analysis included 3 categories, namely, cellular component (CC), molecular function (MF), and biological process (BP).P < .05 was set as the threshold.The DEGs enriched in the most significant KEGG pathways were further mapped to obtain the overlapping genes.

Identification of Overlapping Differentially Expressed Genes
The Gene Expression Profiling Interactive Analysis (GEPIA) database (http: //gep ia.ca ncer-pku.cn/) includes 92 rectum adenocarcinoma (READ) samples and 275 colon adenocarcinoma (COAD) samples from The Cancer Genome Atlas (TCGA).DEGs between TCGA-COAD/ READ and matched normal samples were downloaded from the GEPIA website using the limma method.The online Venn diagram tool (http: //bio infor matic s.psb .ugent.be/ webto ols/V enn/) was utilized to obtain the overlapping DEGs in TCGA-COAD, TCGA-READ, and the most significant KEGG pathways.

Protein-Protein Interaction Networks
Search Tool for the Retrieval of Interacting Genes, version 11.5 (STRING, https ://cn .string-db .org/), 12 was employed for constructing protein-protein interaction (PPI) networks.Medium confidence score of ≥0.4 was used as the parameter.Gene interactions were represented by nodes and edges.Genes with more interactions with other genes are considered more important.

Survival Analysis
The DEGs filtered into PPI networks were further analyzed using Kaplan-Meier plotter (https ://km plot.com/ analys is/in dex.p hp?p= servi ce&ca ncer= panca ncer_ rnaseq) to determine their prognostic values in CRC.Kaplan-Meier plotter database contained only RAN-seq data from READ samples (n = 165), not from COAD samples.Log-rank P < .05depicted statistical significance.

•
The top-ranked Kyoto Encyclopedia of Genes and Genomes pathway for the upregulated genes is cell cycle, and that for the downregulated genes is metabolic pathways.Lipofectamine 3000 (Invitrogen, Carlsbad, Calif, USA).Cells were collected after transfection of 48 hours for further experiments.

Real-Time Quantitative Polymerase Chain Reaction
TRIzol reagent (Invitrogen) was employed for total RNA isolation.Preparation of cDNA was achieved using the iScript cDNA Synthesis Kit (Bio-Rad, Hercules, Calif, USA).Real-time quantitative polymerase chain reaction (RT-qPCR) was conducted on a CFX96 Touch RT-PCR system (Bio-Rad) with SYBR® Premix Ex Taq™ II (Takara, Dalian, China).With normalization to GAPDH, the 2 −ΔΔCt method was employed for determining relative gene expression.Primers are shown in Table 1.

Flow Cytometry
An Annexin V-FITC/PI Apoptosis Detection Kit (Beyotime) was utilized for apoptotic cell detection.In short, cells were washed with phosphate-buffered saline, resuspended in binding buffer (Beyotime), followed by treatment with annexin V-FITC (5 μL) and propidium iodide (10 μL) without light for 10 minutes.Apoptotic cells were detected using a flow cytometer (FACScan, BD Biosciences, Franklin, NJ, USA).

Cell Counting Kit-8 Assay
Colorectal cancer cells (2 × 10 3 /well) were inoculated into 96-well plates and incubated at 37°C for the indicated time period.Afterwards, each well was added with CCK-8 reagent (10 μL, Beyotime) before additional 2-hour culture.The 450 nm optical density was estimated with a microplate reader (Bio-Rad).

Transwell Assay
A Transwell chamber (24-well, 8-μm pore size; Corning Inc., Corning, NY, USA) was employed for examining tumor cell invasion and migration.For evaluation of invasiveness, the upper chamber was precoated with Matrigel (100 μL; Beyotime), and CRC cells (1 × 10 4 ) in serum-free medium (200 μL) were added.The bottom plates were added with complete medium (500 μL) with 10% FBS.After 24 hours, cells that remained in the upper side were gently wiped off using cotton swabs, while those invaded to the bottom plates were dyed with crystal violet for 10 minutes after fixing in 4% paraformaldehyde.For the evaluation of migration, the upper side was not precoated with Matrigel, and other procedures were similar to those of the invasion assay.
The results were depicted with a microscope, and 5 randomly selected fields were used to count invaded and migrated cells.

Statistical Analysis
Data are expressed as mean ± standard deviation.Each experiment was repeated at least 3 times.Difference comparisons between groups were performed by the Student's t-test or one-way analysis of variance, followed by Tukey's post hoc analysis using GraphPad Prism 8.0.2 software (GraphPad, San Diego, Calif, USA).A P < .05indicated statistical significance.

Functional Enrichment Analysis
Gene Ontology and KEGG enrichment analyses were performed on the 6464 DEGs to determine their possible functions.The top 5 significant terms and pathways enriched by the uDEGs and dDEGs were screened out.As displayed in Table 2, the uDEGs were mainly enriched in cell division, rRNA processing, DNA replication, cytoplasmic translation, and mitotic sister chromatid segregation in the BP category; nucleus, cytosol, nucleoplasm, kinetochore, and cytosolic ribosome in the CC category; and protein binding, RNA binding, ATPase activity, structural constituent of ribosome, and DNA replication origin binding in the MF category.As for the dDEGs, they were significantly enriched in signal transduction, adaptive immune response, B cell receptor signaling pathway, phagocytosis, and engulfment in the BP category; plasma membrane and so on in the CC category; and antigen binding, immunoglobulin receptor binding, and beta-amyloid binding in the MF category.
Moreover, as depicted by KEGG results, the uDEGs were markedly enriched in cell cycle, ribosome, spliceosome, nucleocytoplasmic transport, and DNA replication, while the dDEGs were predominantly enriched in metabolismlinked pathways like metabolic pathways, the cAMP signaling pathway, and fatty acid degradation.In general, the most significantly enriched KEGG pathway for the uDEGs was cell cycle, and that for the dDEGs was metabolic pathways, with 52 (Supplementary Table 1) and 275 genes (Supplementary Table 2) enriched, respectively.

Identification of the Overlapping Differentially Expressed Genes
Using GEPIA database, 5356 and 5813 DEGs were obtained from TCGA-COAD and TCGA-READ, respectively, including 2682 uDEGs and 2674 dDEGs in COAD (Figure 2A and 2B), as well as 2874 uDEGs and 2939 dDEGs in READ (Figure 2A and 2B).The web tool Venn diagrams was utilized to identify the overlapping DEGs from the 3 cohorts: COAD, READ, and cell cycle/metabolic pathways.As a result, 36 of the 52 genes that were enriched in cell cycle were validated to be upregulated in both COAD and READ samples, and 45 of the 275 genes enriched in metabolic pathways were validated to be downregulated.

Construction of Protein-Protein Interaction Networks
The PPI networks were constructed using the STRING online database to investigate the interactions among the overlapping DEGs.As shown by the results, the PPI network for the overlapping uDEGs contained 36 nodes and 527 edges (Figure 3), while that for the overlapping dDEGs consisted of 34 nodes and 50 edges (Figure 4).

Validation of Gene Prognostic Value
Next, we further verified the prognostic values of these nodes/DEGs using Kaplan-Meier plotter database.The samples were segregated into high-and low-expression groups based on the gene median value.Notably, the mRNA levels of 6 dDEGs were prominently associated   with READ patients' prognosis, including phospholipase C epsilon 1 (PLCE1), cyclooxygenase 1 (PTGS1), aminomethyltransferase (AMT), ST8 alpha-N-acetyl-neuraminide alpha -2,8-sialy ltran sfera se 1 (ST8SIA1), ST3GAL5, and GBA2.A lower mRNA level of the 6 genes predicted a worse overall survival of patients (Figure 5).

Validation of Gene Expression in Colorectal Cancer
Cell Lines Subsequently, RT-qPCR was conducted to detect gene expression in CRC cell lines.As depicted by the results, there was no marked difference in PTGS1 and AMT expression between tumor and normal cell lines (Figure 6A and 6B).The ST8SIA1 mRNA level in CRC cells was prominently higher than that in NCM460 cells (Figure 6C), which was in conflict with the microarray analysis.Moreover, the mRNA levels of PLCE1, GBA2, and ST3GAL5 were markedly reduced in CRC cell lines (Figure 6D-F), which were consistent with the microarray results.Additionally, PLCE1 has previously been shown to suppress the malignancy of CRC cells. 13Therefore, GBA2 and ST3GAL5 were selected for further analyses.Similar to RT-qPCR, western blotting displayed significantly decreased levels of GBA2 and ST3GAL5 proteins in CRC cell lines (Figure 6G).

The Effect of Gene Overexpression on Colorectal Cancer Cell Malignancy
To determine the specific roles of GBA2 and ST3GAL5 in affecting the biological behaviors of CRC cells, we separately overexpressed GBA2 and ST3GAL5 in 2 CRC cell lines (LoVo and SW620) using their corresponding overexpression vectors (GBA2-OE or ST3GAL5-OE).The transfection of GBA2-OE resulted in an elevation in GBA2 expression in CRC cells (Figure 7A and 7B).Similar results were shown in CRC cells with ST3GAL5-OE transfection (Figure 7C and 7D).These data confirmed that GBA2 and ST3GAL5 were successfully overexpressed in CRC cells.

DISCUSSION
In the present study, we identified 3459 uDEGs and 3005 dDEGs between normal and CRC samples from the dataset GSE184093.Functional enrichment analysis depicted that the majority of these genes were prominently related to cell cycle and metabolic pathways, respectively (Table 2).Subsequently, the GEPIA database was utilized to validate the DEGs enriched in cell cycle and metabolic pathways.As a result, we identified 36 uDEGs and 45 dDEGs (Figure 2) and then constructed PPI networks of these DEGs, which revealed the interaction between different proteins and helped to screen out core regulatory genes.The survival analysis revealed that 6 of the DEGs filtered into PPI networks were prominently associated with CRC patients' prognosis, including PTGS1, AMT, PLCE1, ST8SIA1, ST3GAL5, and GBA2 (Figure 5).The RT-qPCR analysis showed that PLCE1, ST8SIA1, ST3GAL5, and GBA2 were aberrantly regulated between the tumor and nontumor cell lines, while the results of PTGS1 and AMT showed no significance (Figure 6).In vitro experiments showed that overexpressing ST3GAL5 or GBA2 suppressed the malignant behaviors of CRC cells (Figures 8 and 9).The DEG PTGS1, also known as cyclooxygenase 1 (COX-1), encodes an enzyme catalyzing arachidonate conversion to prostaglandin.Several studies have shown that PTGS1/COX1 is upregulated in CRC tissues, and inhibition of COX-1 can suppress CRC cell aggressiveness. 14ontrarily, PTGS1/COX1 downregulation in CRC was also reported, 15 supporting the microarray results in this study.
The DEG AMT (amin ometh yltra nsfer ase) is a component of the glycine cleavage system termed T-protein.
Mutations in this gene in humans lead to nonketotic hyperglycinemia. 16Proteomic analysis reveals that AMT protein expression can be upregulated by the effect of retinoic acids in breast cancer cells, which is associated with tumor cell apoptosis. 17Here, we identified that AMT was downregulated in CRC specimens and that its higher level was significantly linked to better overall survival in CRC patients.In addition, among the 6 DEGs identified, AMT has the best 95% CI range (0.12-0.66) in the survival analysis, indicating its potential significance in CRC prognosis.However, genes PTGS1 and AMT were not validated at the cell line level, which may be ascribed to the limitations of cell lines, highlighting the need for further research.The phospholipase C epsilon 1 gene, encoding a phospholipase enzyme, catalyzes phosp hatid ylino sitol -4,5-bisphospha te hydrolysis to produce two secondary messengers: diacylglycerol (DAG) and inositol 1,4,5-triphosphate (IP3).Inositol 1,4,5-triphosphate and DAG subsequently regulate various processes that affect gene expression, cell growth, and differentiation. 18vidence indicates that PLCE1 is tightly related to the clinical staging and survival of CRC patients. 19Importantly, PLCE1 has been indicated to exert an inhibitory effect on the incidence of CRC.PLCE1 is lowly expressed in CRC samples, and its overexpression markedly suppresses CRC cell proliferation, 13 which was consistent with our results.ST8SIA1 encodes a type II membrane protein catalyzing the synthesis of ganglioside GD3, which is critical for cell growth and adhesion of cultured tumor cells. 20revious research has demonstrated the involvement of ST8SIA1 in several cancers.For example, in bladder cancer, ST8SIA1 suppresses cell proliferative, invasive, and migratory capabilities by inhibiting the JAK/STAT3 pathway. 21ST8SIA1 knockout dramatically blocks tumor metastasis and growth of triple-negative breast carcinoma via the FAK-AKT-mTOR signaling pathway. 22These indicate that ST8SIA1 may work as a tumor activator or suppressor depending on cancer types.Intriguingly, evidence has demonstrated ST8SIA1 upregulation in CRC tissues and that upregulating ST8SIA1 could attenuate miR-3 3a/ let-7e-media ted suppressive effects on cell proliferation and chemoresistance in CRC. 23Consistent with this, our experimental data depicted that the ST8SIA1 level in CRC cell lines was much higher than in the nontumor cell line.
The ST3GAL5 gene also encodes a type II membrane protein that converts lactosylceramide into GM3.ST3GAL5 has been verified to be implicated in multiple malignancies, such as bladder cancer and hepatocellular carcinoma. 24,25Previous evidence has suggested that ST3GAL5 is overexpressed in rectal carcinoma samples compared to normal samples. 26On the contrary, this study revealed that ST3GAL5 was underexpressed in CRC samples and that overexpressing ST3GAL5 led to reduced aggressiveness of CRC cells in vitro.GBA2 encodes a microsomal beta-glucosidase catalyzing the hydrolysis of bile acid 3-O-glucosides as endogenous compounds.GBA2 can mitigate the tumorigenicity of human melanoma cells. 27owever, the expression and role of GBA2 in CRC have never been studied.Similar to previous reports, our study depicted that GBA2 upregulation repressed the malignant behaviors of CRC cells, indicating that GBA2 might work as a tumor suppressor in CRC.In addition, functional enrichment analysis showed that both ST3GAL5 and GBA2 were enriched in metabolic pathways.This suggests that overexpression of these two genes may suppress the malignant behaviors of CRC by regulating metabolism-related pathways.
It is worth noting that our study has some limitations.

Figure 3 .
Figure 3. Protein-protein interaction network for the overlapping upregulated differentially expressed genes.

Figure 4 .
Figure 4. Protein-protein interaction network for the overlapping downregulated differentially expressed genes.

Figure 5 .
Figure 5. Survival analysis.Kaplan-Meier plotter database for analyzing the correlation between gene expression and overall survival of rectal adenocarcinoma patients.

Figure 6 .
Figure 6.Validation of gene expression in CRC cell lines.(A-F) RT-qPCR analysis of mRNA levels of PLCE1, PTGS1, AMT, ST8SIA1, GBA2, and ST3GAL5 in CRC cell lines and the normal human colon epithelial cell line NCM460.(G) Western blotting for evaluating protein levels of GBA2 and ST3GAL5 in different cell lines.CRC, colorectal cancer; RT-qPCR, real-time quantitative polymerase chain reaction.**P < .01.

Figure 7 .
Figure 7. Overexpression efficiency of GBA2 or ST3GAL5 in CRC cells.RT-qPCR (A) and western blotting (B) for detecting GBA2 mRNA and protein expression in CRC cells with/without GBA2-OE transfection.RT-qPCR (C) and western blotting (D) for detecting ST3GAL5 mRNA and protein expression in CRC cells with/without ST3GAL5-OE transfection.CRC, colorectal cancer; mRNA, messenger ribonucleic acid; RT-qPCR, real-time quantitative polymerase chain reaction.**P < .01.

Table 1 .
Primer Sequences Used for Real-Time Quantitative

Table 2 .
Functional Enrichment Analysis of Differentially Expressed Genes

Table 2 .
The dDEGs Enriched in Metabolic Pathways(Continued)